Totally data-driven duration modeling based on generalized linear model for Mandarin TTS

نویسندگان

  • Lifu Yi
  • Jian Li
  • Xiaoyan Lou
  • Jie Hao
چکیده

This paper proposes a totally data-driven duration modeling method for Mandarin TTS, which uses Generalized Linear Models (GLM) to model duration and stepwise regression to automatically select the attribute set with statistical measurements. This method can get a good tradeoff between model complexity and goodness of fit. Besides, speaking rate is introduced as a new modeling attribute, which not only achieves higher performance but also provides a novel approach to adjust speaking rate when synthesizing. We also propose to use R to fairly evaluate the modeling performances on different databases, since R refers to the fraction of corresponding variance explained by a model. Experiments show the performance of GLM is significantly higher than that of CART. With our much smaller models and corpus, the proposed method also achieves comparable results reported by other excellent researches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Totally-Data-Driven Framework for Duration and Intonation Modeling

This paper proposes a unified framework for duration and intonation modeling in Mandarin TTS. In this framework, we design a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. By this representation, we can decompose F0 vector into 3 orthogonal polynomial parameters that are continuous scalars. Based on this vector-to-scalar decomposition, we ca...

متن کامل

The Toshiba Mandarin TTS System for the Blizzard Challenge 2008

This paper describes the Toshiba Mandarin Text-to-Speech (TTS) system that was submitted to the Blizzard Challenge 2008. The front-end of the system uses machine-learning approaches such as generalized linear models (GLM) and Quantification Method Type 1 (QMT1) to predict pause, duration and F0 contour. According to the predicted prosody information, the back-end of the system uses Toshiba’s ow...

متن کامل

Totally data-driven intonation prediction model using a novel F0 contour parametric representation

This paper proposes a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. The polynomial is a simplified representation of Parallel Encoding and Target Approximation (PENTA) intonation model that includes a target component and an approximation component. We also propose predicting the polynomial parameters from linguistic and phonetic attributes...

متن کامل

Phrase break prediction using logistic generalized linear model

In this paper we propose a novel phrase break prediction model for Mandarin speech synthesis. It is generalized linear models (GLM) with stepwise regression solution. We assume phrase break obeys Bernoulli distribution and then model phrase break probability by Logistic GLM. The attribute set is automatically selected by stepwise regression, which is a totally data-driven method. We also introd...

متن کامل

Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis

This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006